16 research outputs found
Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation
Subsurface datasets inherently possess big data characteristics such as vast
volume, diverse features, and high sampling speeds, further compounded by the
curse of dimensionality from various physical, engineering, and geological
inputs. Among the existing dimensionality reduction (DR) methods, nonlinear
dimensionality reduction (NDR) methods, especially Metric-multidimensional
scaling (MDS), are preferred for subsurface datasets due to their inherent
complexity. While MDS retains intrinsic data structure and quantifies
uncertainty, its limitations include unstabilized unique solutions invariant to
Euclidean transformations and an absence of out-of-sample points (OOSP)
extension. To enhance subsurface inferential and machine learning workflows,
datasets must be transformed into stable, reduced-dimension representations
that accommodate OOSP.
Our solution employs rigid transformations for a stabilized Euclidean
invariant representation for LDS. By computing an MDS input dissimilarity
matrix, and applying rigid transformations on multiple realizations, we ensure
transformation invariance and integrate OOSP. This process leverages a convex
hull algorithm and incorporates loss function and normalized stress for
distortion quantification. We validate our approach with synthetic data,
varying distance metrics, and real-world wells from the Duvernay Formation.
Results confirm our method's efficacy in achieving consistent LDS
representations. Furthermore, our proposed "stress ratio" (SR) metric provides
insight into uncertainty, beneficial for model adjustments and inferential
analysis. Consequently, our workflow promises enhanced repeatability and
comparability in NDR for subsurface energy resource engineering and associated
big data workflows.Comment: 30 pages, 17 figures, Submitted to Computational Geosciences Journa
Mitigation of Spatial Nonstationarity with Vision Transformers
Spatial nonstationarity, the location variance of features' statistical
distributions, is ubiquitous in many natural settings. For example, in
geological reservoirs rock matrix porosity varies vertically due to
geomechanical compaction trends, in mineral deposits grades vary due to
sedimentation and concentration processes, in hydrology rainfall varies due to
the atmosphere and topography interactions, and in metallurgy crystalline
structures vary due to differential cooling. Conventional geostatistical
modeling workflows rely on the assumption of stationarity to be able to model
spatial features for the geostatistical inference. Nevertheless, this is often
not a realistic assumption when dealing with nonstationary spatial data and
this has motivated a variety of nonstationary spatial modeling workflows such
as trend and residual decomposition, cosimulation with secondary features, and
spatial segmentation and independent modeling over stationary subdomains. The
advent of deep learning technologies has enabled new workflows for modeling
spatial relationships. However, there is a paucity of demonstrated best
practice and general guidance on mitigation of spatial nonstationarity with
deep learning in the geospatial context. We demonstrate the impact of two
common types of geostatistical spatial nonstationarity on deep learning model
prediction performance and propose the mitigation of such impacts using
self-attention (vision transformer) models. We demonstrate the utility of
vision transformers for the mitigation of nonstationarity with relative errors
as low as 10%, exceeding the performance of alternative deep learning methods
such as convolutional neural networks. We establish best practice by
demonstrating the ability of self-attention networks for modeling large-scale
spatial relationships in the presence of commonly observed geospatial
nonstationarity
Optimal Placement of Public Electric Vehicle Charging Stations Using Deep Reinforcement Learning
The placement of charging stations in areas with developing charging
infrastructure is a critical component of the future success of electric
vehicles (EVs). In Albany County in New York, the expected rise in the EV
population requires additional charging stations to maintain a sufficient level
of efficiency across the charging infrastructure. A novel application of
Reinforcement Learning (RL) is able to find optimal locations for new charging
stations given the predicted charging demand and current charging locations.
The most important factors that influence charging demand prediction include
the conterminous traffic density, EV registrations, and proximity to certain
types of public buildings. The proposed RL framework can be refined and applied
to cities across the world to optimize charging station placement.Comment: 25 pages with 12 figures. Shankar Padmanabhan and Aidan Petratos
provided equal contributio
Texas orphan well road traversal
Contains data used in the FRI Summer Fellowship 2021 Plug and Abandon project. Contains shapefiles storing data on the roads of Texas as well as orphaned wells within each county
Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy
Highlights
• Our data split method handles spatial autocorrelation and imposes prediction fairness.
• The sets impose fair algorithms with similar difficulty in all machine learning steps.
• Kriging variance is a surrogate of spatial prediction difficulty.
• The resulting training and test sets are compatible with any machine learning model.
Machine learning supports prediction and inference in multivariate and complex datasets where observations are spatially related to one another. Frequently, these datasets depict spatial autocorrelation that violates the assumption of identically and independently distributed data. Overlooking this correlation result in over-optimistic models that fail to account for the geographical configuration of data. Furthermore, although different data split methods account for spatial autocorrelation, these methods are inflexible, and the parameter training and hyperparameter tuning of the machine learning model is set with a different prediction difficulty than the planned real-world use of the model. In other words, it is an unfair training-testing process. We present a novel method that considers spatial autocorrelation and planned real-world use of the spatial prediction model to design a fair train-test split.
Demonstrations include two examples of the planned real-world use of the model using a realistic multivariate synthetic dataset and the analysis of 148 wells from an undisclosed Equinor play. First, the workflow applies the semivariogram model of the target to compute the simple kriging variance as a proxy of spatial estimation difficulty based on the spatial data configuration. Second, the workflow employs a modified rejection sampling to generate a test set with similar prediction difficulty as the planned real-world use of the model. Third, we compare 100 test sets' realizations to the model's planned real-world use, using probability distributions and two divergence metrics: the Jensen-Shannon distance and the mean squared error. The analysis ranks the spatial fair train-test split method as the only one to replicate the difficulty (i.e., kriging variance) compared to the validation set approach and spatial cross-validation. Moreover, the proposed method outperforms the validation set approach, yielding a minor mean percentage error when predicting a target feature in an undisclosed Equinor play using a random forest model.
The resulting outputs are training and test sets ready for model fit and assessment with any machine learning algorithm. Thus, the proposed workflow offers spatial aware sets ready for predictive machine learning problems with similar estimation difficulty as the planned real-world use of the model and compatible with any spatial data analysis task
Texas oil and gas well database
Working database of Texas Railroad Commission data, EIA oil and gas prices, and geological dat
A Machine Learning Workflow to Support the Identification of Subsurface Resource Analogs
Identifying subsurface resource analogs from mature subsurface datasets is vital for developing new prospects due to often initial limited or absent information. Traditional methods for selecting these analogs, executed by domain experts, face challenges due to subsurface dataset's high complexity, noise, and dimensionality. This article aims to simplify this process by introducing an objective geostatistics-based machine learning workflow for analog selection. Our innovative workflow offers a systematic and unbiased solution, incorporating a new dissimilarity metric and scoring metrics, group consistency, and pairwise similarity scores. These elements effectively account for spatial and multivariate data relationships, measuring similarities within and between groups in reduced dimensional spaces. Our workflow begins with multidimensional scaling from inferential machine learning, utilizing our dissimilarity metric to obtain data representations in a reduced dimensional space. Following this, density-based spatial clustering of applications with noise identifies analog clusters and spatial analogs in the reduced space. Then, our scoring metrics assist in quantifying and identifying analogous data samples, while providing useful diagnostics for resource exploration. We demonstrate the efficacy of this workflow with wells from the Duvernay Formation and a test scenario incorporating various well types common in unconventional reservoirs, including infill, outlier, sparse, and centered wells. Through this application, we successfully identified and grouped analog clusters of test well samples based on geological properties and cumulative gas production, showcasing the potential of our proposed workflow for practical use in the field